Urban Greenspace and Asthma Prevalence¶
This project examines the link between green space structure (using NDVI as an indicator of vegetation health) and human health at the census tract level. To match the spatial resolution of health data from the CDC, we analyze NDVI within individual census tracts. While coarse-resolution sensors like Landsat or MODIS could provide average NDVI values, understanding the finer spatial patterns of vegetation requires higher-resolution imagery. For this purpose, we use 1-meter resolution data from the National Agricultural Imagery Program (NAIP), which captures imagery across the continental U.S. every few years. This detailed data allows us to observe vegetation structure at the scale of individual trees, providing a more accurate picture of green space in urban environments.
References¶
# Import libraries
import os
import pathlib
import time
import warnings
import geopandas as gpd
import geoviews as gv
import holoviews as hv
import hvplot.pandas
import hvplot.xarray
import numpy as np #unpack Fmask
import pandas as pd
import pystac_client
import rioxarray as rxr
import rioxarray.merge as rxrmerge
import shapely
import requests
import zipfile
from io import BytesIO
import xarray as xr
from cartopy import crs as ccrs
gv.extension('bokeh')
from scipy.ndimage import convolve
from sklearn.model_selection import KFold
from scipy.ndimage import label
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from tqdm.notebook import tqdm
warnings.simplefilter('ignore')
c:\Users\Anamaria\miniconda3\envs\earth-analytics-python\Lib\site-packages\dask\dataframe\__init__.py:42: FutureWarning: Dask dataframe query planning is disabled because dask-expr is not installed. You can install it with `pip install dask[dataframe]` or `conda install dask`. This will raise in a future version. warnings.warn(msg, FutureWarning)
# Prevent GDAL from quitting due to momentary disruptions
os.environ["GDAL_HTTP_MAX_RETRY"] = "5"
os.environ["GDAL_HTTP_RETRY_DELAY"] = "1"
STEP 2: Create a site map¶
Use the Center for Disease Control (CDC) Places dataset for human health data to compare with vegetation. CDC Places also provides some modified census tracts, clipped to the city boundary, to go along with the health data. We’ll start by downloading the matching geographic data, and then select the City of Chicago.
## Define dir
data_dir = 'data'
os.makedirs(data_dir, exist_ok=True)
# Define output dir
tract_dir = os.path.join(data_dir, 'chicago-tract')
os.makedirs(tract_dir, exist_ok=True)
tract_path = os.path.join(tract_dir, 'chicago-tract.shp')
# Download and extract census tracts
if not os.path.exists(tract_path):
tract_url = 'https://data.cdc.gov/download/x7zy-2xmx/application%2Fzip'
response = requests.get(tract_url)
tract_gdf = gpd.read_file(tract_url)
chicago_tract_gdf = tract_gdf[tract_gdf['PlaceName'] == 'Chicago']
chicago_tract_gdf.to_file(tract_path, index=False)
# Load saved file
chicago_tract_gdf = gpd.read_file(tract_path)
# Look at columns
print("Columns available:", chicago_tract_gdf.columns)
# Visualization
chicago_tract_gdf.to_crs(ccrs.Mercator()).hvplot(
line_color='orange', fill_color=None,
crs=ccrs.Mercator(), tiles='EsriImagery',
frame_width=600
)
Columns available: Index(['place2010', 'tract2010', 'ST', 'PlaceName', 'plctract10', 'PlcTrPop10',
'geometry'],
dtype='object')
City of Chicago Data Descriptcion about green space¶
The distribution of green space in Chicago is uneven, with more concentrated vegetation on the North Side, sparse greenery in the more built-up West and South Sides, large linear parks along the lakefront, and a dense urban core with limited green coverage.
Chicago has a long history of racial and socioeconomic segregation, which has contributed to significant environmental inequality across the city. Environmental justice studies have shown that low-income communities and racial minorities tend to have less access to green spaces. For instance, a 2020 report by the Trust for Public Land highlighted major disparities in park access throughout Chicago. Urban planning has also played a key role: the lakefront was intentionally designed with public parks, and the more affluent neighborhoods on the North Side have received greater investment in green infrastructure such as trees, parks, and recreational areas. In contrast, the West and South Sides have been shaped by industrial development and have seen less green investment. These patterns have direct environmental and public health implications—areas with less vegetation tend to experience more intense urban heat island effects, poorer air quality, and higher rates of asthma and other respiratory illnesses, which directly relates to the focus of your research on green space and health.
data_dir = 'data'
os.makedirs(data_dir, exist_ok=True)
# load TRACTS from Il
tracts_url = "https://www2.census.gov/geo/tiger/TIGER2024/TRACT/tl_2024_17_tract.zip"
tracts_path = os.path.join(data_dir, 'tl_2024_17_tract.shp')
if not os.path.exists(tracts_path):
tracts_gdf = gpd.read_file(tracts_url)
tracts_gdf.to_file(tracts_path)
else:
tracts_gdf = gpd.read_file(tracts_path)
# Filter tracts from Cook (FIPS = 031)
cook_tracts_gdf = tracts_gdf[tracts_gdf['COUNTYFP'] == '031']
# 3. Download boundary
place_url = "https://www2.census.gov/geo/tiger/TIGER2024/PLACE/tl_2024_17_place.zip"
place_path = os.path.join(data_dir, 'tl_2024_17_place.shp')
if not os.path.exists(place_path):
place_gdf = gpd.read_file(place_url)
place_gdf.to_file(place_path)
else:
place_gdf = gpd.read_file(place_path)
# Filter city of Chicago
chicago_place_gdf = place_gdf[place_gdf['NAME'] == 'Chicago']
# SPATIAL JOIN: Tracts inside of chicago boundary
# same cbr
cook_tracts_gdf = cook_tracts_gdf.to_crs(chicago_place_gdf.crs)
# Spatial join: only tracts inside boundary Chicago
chicago_tracts_gdf = gpd.sjoin(
cook_tracts_gdf,
chicago_place_gdf[['geometry']],
how="inner",
predicate="intersects" # o "within"
).drop(columns='index_right')
# Save results
chicago_tracts_path = os.path.join(data_dir, 'chicago_tracts_only.shp')
chicago_tracts_gdf.to_file(chicago_tracts_path)
The Map it shows the census tracts of Cook County, where Chicago is located. Satellite imagery (EsriImagery) is used as the background to provide better urban context. The tracts are outlined in blue with transparency to allow visibility of the underlying map.
For information on how to access this data via the CDC portal, you can follow this tutorial.
You can obtain urls for the U.S. Census Tract shapefiles from the TIGER service. You’ll notice that these URLs use the state FIPS, which you can get by looking it up e.g. here, or by installing and using the us package.
STEP 3 - Access Asthma and Urban Greenspaces Data¶
# Set up a path for the asthma data
asthma_path = os.path.join(data_dir, 'asthma.csv')
# Download asthma data (only once)
if not os.path.exists(asthma_path):
asthma_url = (
"https://data.cdc.gov/resource/cwsq-ngmh.csv"
"?year=2022"
"&stateabbr=IL"
"&countyname=Cook"
"&measureid=CASTHMA"
"&$limit=1500"
)
asthma_df = (
asthma_df.rename
(columns={
'data_value': 'asthma',
'low_confidence_limit': 'asthma_ci_low',
'high_confidence_limit': 'asthma_ci_high',
'locationname': 'tract'})
[[
'year', 'tract',
'asthma', 'asthma_ci_low', 'asthma_ci_high', 'data_value_unit',
'totalpopulation', 'totalpop18plus'
]]
)
asthma_df.to_csv(asthma_path, index=False)
# Load in asthma data
asthma_df = pd.read_csv(asthma_path)
# Preview asthma data
asthma_df
| year | tract | asthma | asthma_ci_low | asthma_ci_high | data_value_unit | totalpopulation | totalpop18plus | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2022 | 17031031900 | 8.6 | 7.7 | 9.7 | % | 2522 | 2143 |
| 1 | 2022 | 17031062600 | 8.3 | 7.3 | 9.3 | % | 2477 | 1760 |
| 2 | 2022 | 17031070101 | 8.9 | 7.9 | 9.9 | % | 4171 | 3912 |
| 3 | 2022 | 17031151200 | 8.6 | 7.7 | 9.7 | % | 3880 | 3116 |
| 4 | 2022 | 17031241100 | 9.0 | 8.0 | 10.0 | % | 3574 | 3044 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1323 | 2022 | 17031834900 | 13.5 | 12.1 | 15.0 | % | 1952 | 1451 |
| 1324 | 2022 | 17031828601 | 9.8 | 8.8 | 11.0 | % | 4198 | 3227 |
| 1325 | 2022 | 17031843700 | 8.4 | 7.5 | 9.5 | % | 2544 | 1891 |
| 1326 | 2022 | 17031829700 | 10.8 | 9.6 | 12.1 | % | 3344 | 2524 |
| 1327 | 2022 | 17031829100 | 10.2 | 9.1 | 11.5 | % | 3512 | 2462 |
1328 rows × 8 columns
print(asthma_df.columns)
Index(['year', 'tract', 'asthma', 'asthma_ci_low', 'asthma_ci_high',
'data_value_unit', 'totalpopulation', 'totalpop18plus'],
dtype='object')
Join health data with census tract boundaries¶
# Change tract identifier datatype for merging
chicago_tracts_gdf['GEOID'] = chicago_tracts_gdf['GEOID'].astype('int64')
asthma_df['tract'] = asthma_df['tract'].astype('int64')
# Merge census data with geometry
tract_cdc_gdf = chicago_tracts_gdf.merge(
asthma_df,
left_on='GEOID',
right_on='tract',
how='inner'
)
# Plot asthma data as chloropleth
(
gv.tile_sources.EsriImagery *
gv.Polygons(
tract_cdc_gdf.to_crs(ccrs.Mercator()),
vdims=['asthma', 'GEOID'],
crs=ccrs.Mercator()
).opts(
color='asthma',
colorbar=True,
tools=['hover'],
line_color='black',
fill_alpha=0.6,
width=600,
height=600,
title='Asthma Prevalence by Census Tract in Chicago'
)
).opts(xaxis=None, yaxis=None)
The asthma prevalence data was obtained from the Centers for Disease Control and Prevention (CDC) PLACES project, which provides model-based estimates of chronic disease indicators at the census tract level across the United States. The data used here represents the percentage of adults with current asthma in 2022 for census tracts in Cook County, Illinois. Map interpretation: Higher asthma prevalence (darker blue areas) tends to be concentrated in the South Side and parts of the West Side of Chicago. Lower asthma rates are generally seen in the North Side, especially near the lakefront and wealthier neighborhoods. Possible Causes: Envionmental injustice, this areas asr exposed to air pollution, industrial activity and major highways. Also areas less tree cover and few parks that became poorer air quality. Historic segregation and redlining.
Citation¶
Centers for Disease Control and Prevention. PLACES: Local Data for Better Health, 2022 release. Available from: https://data.cdc.gov/resource/cwsq-ngmh.csv
# Connect to the planetary computer catalog
e84_catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1"
)
e84_catalog.title
'Microsoft Planetary Computer STAC API'
Using a loop, for each Census Tract:¶
# Convert geometry to lat/lon for STAC
tract_latlon_gdf = tract_cdc_gdf.to_crs(4326)
# Define a path to save NDVI stats
ndvi_stats_path = os.path.join(data_dir, 'chicago-ndvi-stats.csv')
# Check for existing data - do not access duplicate tracts
downloaded_tracts = []
if os.path.exists(ndvi_stats_path):
ndvi_stats_df = pd.read_csv(ndvi_stats_path)
downloaded_tracts = ndvi_stats_df.tract.values
else:
print('No census tracts downloaded so far')
# Loop through each census tract
scene_dfs = []
for i, tract_values in tqdm(tract_latlon_gdf.iterrows()):
tract = tract_values['GEOID'] # <-- cambio aquÃ: tract2010 → GEOID
# Check if statistics are already downloaded for this tract
if not (tract in downloaded_tracts):
# Retry up to 5 times in case of a momentary disruption
i = 0
retry_limit = 5
while i < retry_limit:
# Try accessing the STAC
try:
# Search for tiles
naip_search = e84_catalog.search(
collections=["naip"],
intersects=shapely.to_geojson(tract_values.geometry),
datetime="2021"
)
# Build dataframe with tracts and tile urls
scene_dfs.append(pd.DataFrame(dict(
tract=tract,
date=[pd.to_datetime(scene.datetime).date()
for scene in naip_search.items()],
rgbir_href=[scene.assets['image'].href for scene in naip_search.items()],
)))
break
# Try again in case of an APIError
except pystac_client.exceptions.APIError:
print(
f'Could not connect with STAC server. '
f'Retrying tract {tract}...'
)
time.sleep(2)
i += 1
continue
# Concatenate the url dataframes
if scene_dfs:
scene_df = pd.concat(scene_dfs).reset_index(drop=True)
else:
scene_df = None
# Preview the URL DataFrame
scene_df
No census tracts downloaded so far
0it [00:00, ?it/s]
| tract | date | rgbir_href | |
|---|---|---|---|
| 0 | 17031030101 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1 | 17031030701 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 2 | 17031070103 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 3 | 17031807900 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 4 | 17031807900 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| ... | ... | ... | ... |
| 1422 | 17031283200 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1423 | 17031100500 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1424 | 17031100500 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1425 | 17031710300 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1426 | 17031710300 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
1427 rows × 3 columns
Compute NDVI Statistics¶
# Converted geometry for STAC
tract_latlon_gdf = tract_cdc_gdf.to_crs(4326)
# Save stadistics
ndvi_stats_path = os.path.join(data_dir, 'chicago-ndvi-stats.csv')
# Check for tracts upload
downloaded_tracts = []
if os.path.exists(ndvi_stats_path):
ndvi_stats_df = pd.read_csv(ndvi_stats_path)
downloaded_tracts = ndvi_stats_df['tract'].astype(str).values
else:
print('No census tracts downloaded so far')
# if there is no data skip
if scene_df is not None:
for tract, tract_date_gdf in tqdm(scene_df.groupby('tract')):
tract = str(tract)
if tract in downloaded_tracts:
continue
tile_das = []
for _, href_s in tract_date_gdf.iterrows():
try:
# open imagen from STAC
tile_da = rxr.open_rasterio(
href_s.rgbir_href, masked=True
).squeeze()
# Upload geometry from tract y show it
boundary = (
tract_cdc_gdf
.set_index('GEOID')
.loc[[int(tract)]]
.to_crs(tile_da.rio.crs)
.geometry
)
# skip
crop_da = tile_da.rio.clip_box(
*boundary.envelope.total_bounds,
auto_expand=True
)
clip_da = crop_da.rio.clip(boundary, all_touched=True)
# Calcule NDVI
ndvi_da = (
(clip_da.sel(band=4) - clip_da.sel(band=1))
/ (clip_da.sel(band=4) + clip_da.sel(band=1))
)
tile_das.append(ndvi_da)
except Exception as e:
print(f"Error in tract {tract}: {e}")
continue
if not tile_das:
continue
# join scenes
scene_da = tile_das[0] if len(tile_das) == 1 else xr.concat(tile_das, dim='band').mean(dim='band')
# vegetetion mask
veg_mask = (scene_da > 0.3)
# Stadistics
total_pixels = scene_da.notnull().sum().item()
veg_pixels = veg_mask.sum().item()
# calculate mean and patch size
labeled_patches, num_patches = label(veg_mask)
patch_sizes = np.bincount(labeled_patches.ravel())[1:] # Ignora fondo
mean_patch_size = patch_sizes.mean() if len(patch_sizes) > 0 else 0
# Calculate edge density
kernel = np.array([[1, 1, 1], [1, -8, 1], [1, 1, 1]])
edges = convolve(veg_mask, kernel, mode='constant')
edge_density = np.sum(edges != 0) / veg_mask.size
# Save results
pd.DataFrame(dict(
tract=[tract],
total_pixels=[int(total_pixels)],
frac_veg=[float(veg_pixels / total_pixels)],
mean_patch_size=[mean_patch_size],
edge_density=[edge_density]
)).to_csv(
ndvi_stats_path,
mode='a',
index=False,
header=(not os.path.exists(ndvi_stats_path))
)
# Reload result fron file
ndvi_stats_df = pd.read_csv(ndvi_stats_path)
ndvi_stats_df.head()
No census tracts downloaded so far
100%|██████████| 867/867 [1:32:22<00:00, 6.39s/it]
| tract | total_pixels | frac_veg | mean_patch_size | edge_density | |
|---|---|---|---|---|---|
| 0 | 17031010100 | 1056066 | 0.177363 | 58.169876 | 0.116155 |
| 1 | 17031010201 | 1518544 | 0.227344 | 31.608863 | 0.206110 |
| 2 | 17031010202 | 977180 | 0.186742 | 63.493737 | 0.123823 |
| 3 | 17031010300 | 1314286 | 0.190898 | 56.904967 | 0.126692 |
| 4 | 17031010400 | 3591474 | 0.094179 | 38.258455 | 0.063082 |
# Merge de datos de tramos censales + asma + NDVI
chi_ndvi_cdc_gdf = (
tract_cdc_gdf
.merge(
ndvi_stats_df,
left_on='GEOID', # Use GEOID c
right_on='tract',
how='inner'
)
)
# Plot chloropleths with vegetation statistics
def plot_chloropleth(gdf, **opts):
"""Generate a chloropleth with the given color column"""
return gv.Polygons(
gdf.to_crs(ccrs.Mercator()),
crs=ccrs.Mercator()
).opts(
xaxis=None,
yaxis=None,
colorbar=True,
line_color='black',
width=600,
height=600,
tools=['hover'],
**opts
)
# Show maps asthma and edge_density
(
plot_chloropleth(chi_ndvi_cdc_gdf, color='asthma', cmap='viridis', title='Asthma Prevalence')
+
plot_chloropleth(chi_ndvi_cdc_gdf, color='edge_density', cmap='Greens', title='Edge Density (NDVI > 0.3)')
)
Plot and Description Here:
The visualizations reveal a clear spatial contrast between asthma prevalence and vegetation structure across Chicago. Areas with higher asthma rates—particularly in the South and West Sides—tend to correspond with tracts showing lower vegetation edge density. In contrast, the North Side and lakefront neighborhoods exhibit both lower asthma prevalence and higher vegetation complexity. This pattern suggests a potential inverse relationship between vegetation structure and asthma: tracts with less fragmented or sparse vegetation may contribute to poorer air quality and reduced environmental buffering, which are known factors in respiratory health. These findings align with research on environmental injustice, where historically marginalized communities often face both reduced access to green space and increased exposure to air pollutants. Therefore, the maps support the idea that vegetation distribution and quality may play a role in shaping health disparities in urban environments like Chicago.
STEP 4: Explore a linear ordinary least-squares regression¶
Model description¶
This linear regression model explores whether vegetation structure—specifically edge_density (the complexity of vegetated areas) and frac_veg (fraction of green cover)—can explain variation in adult asthma prevalence across census tracts in Chicago. The model assumes a linear relationship, normally distributed errors, and independent observations. We assess model performance using R² to understand how much of the variability in asthma is explained by vegetation metrics, and p-values to test statistical significance. The model is straightforward and interpretable, making it useful for identifying broad patterns. However, it does not account for other key factors such as air pollution, socioeconomic status, or spatial autocorrelation, which could influence both asthma rates and green space distribution.
# Variable selection and census tract
model_df = (
chi_ndvi_cdc_gdf
[['frac_veg', 'mean_patch_size', 'edge_density', 'asthma', 'geometry']]
.dropna() # eliminate census tract without data
.copy()
)
# Change the variable asthma o normalize
model_df['log_asthma'] = np.log(model_df['asthma'])
# Matriz de dispersión para explorar relaciones entre variables
hvplot.scatter_matrix(
model_df[['frac_veg', 'mean_patch_size', 'edge_density', 'log_asthma']],
width=500,
height=500,
diagonal='hist'
)
I selected vegetation structure variables (frac_veg, mean_patch_size, edge_density) along with asthma prevalence to evaluate linear relationships. I applied a log transformation to the asthma variable to reduce right skew and better meet the assumption of normally distributed residuals for linear regression. I dropped rows with missing values using .dropna() to ensure a complete dataset for modeling. I used hvplot.scatter_matrix() like the manual say, to explore potential correlations, check for skewed distributions, and identify predictors that might require further transformation or standardization.
To check the predictors and a check of the variables frac_veg, mean_patch_size, edge_density. The next step should be to check that the histograms are not too skewed and it may be necessary to make some transformation . On the other hand, if the scales are very different, standardize with, for example, standardScaler. Check for multicollinearity
Fit and Predict¶
# Select predictor and outcome variables
X = model_df[['edge_density', 'mean_patch_size']]
y = model_df[['log_asthma']]
# Split into training and testing datasets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42
)
# Fit a linear regression
reg = LinearRegression()
reg.fit(X_train, y_train)
# Predict asthma values for the test dataset
# variable is in log, have to be change with exp)
y_test = y_test.copy() # To avoid warnings
y_test['pred_asthma'] = np.exp(reg.predict(X_test))
y_test['asthma'] = np.exp(y_test['log_asthma'])
# Plot measured vs. predicted asthma prevalence with a 1-to-1 line
y_max = y_test['asthma'].max()
(
y_test.hvplot.scatter(
x='asthma', y='pred_asthma',
xlabel='Measured Adult Asthma Prevalence',
ylabel='Predicted Adult Asthma Prevalence',
title='Linear Regression Performance - Testing Data',
size=6
)
.opts(aspect='equal', xlim=(0, y_max), ylim=(0, y_max), height=600, width=600)
* hv.Slope(slope=1, y_intercept=0).opts(color='black', line_dash='dashed')
)
# Calcular RMSE y R²
rmse = mean_squared_error(y_test['asthma'], y_test['pred_asthma'], squared=False)
r2 = r2_score(y_test['asthma'], y_test['pred_asthma'])
rmse, r2
(np.float64(1.6155298784012504), 0.07869119509576028)
The interpretation of the metrics, in RMSE with an average value of 1.616 when predicting the prevalence of asthma. This is a relatively high error, considering that the rates can oscillate between an estimated range of 6% to 15%. The coefficient of determination suggests that only 7.9% of the variability of the asthma values can be explained by the model, including the edge_density and mean_patch_size variables, so the model may not be reflecting well the relationship between the variables and the prevalence of asthma. Therefore, it is an acceptable hypothesis for vegetation and asthma, because these variables alone do not have the spatial pattern to adequately or accurately explain asthma in Chicago; variables such as air quality, which is an important factor to consider, would be missing.
Spatial bias¶
We always need to think about bias, or systematic error, in model results. Every model is going to have some error, but we’d like to see that error evenly distributed. When the error is systematic, it can be an indication that we are missing something important in the model.
In geographic data, it is common for location to be a factor that doesn’t get incorporated into models. After all – we generally expect places that are right next to each other to be more similar than places that are far away (this phenomenon is known as spatial autocorrelation). However, models like this linear regression don’t take location into account at all.
# Compute model error for all census tracts
model_df['pred_asthma'] = np.exp(reg.predict(X))
model_df['err_asthma'] = model_df['pred_asthma'] - model_df['asthma']
# Plot error geographically as a chloropleth
(
plot_chloropleth(model_df, color='err_asthma', cmap='RdBu')
.redim.range(err_asthma=(-3, 3)) # ajusta el rango si lo ves necesario
.opts(
title='Prediction Error: Predicted - Measured Asthma',
frame_width=600,
aspect='equal',
tools=['hover'],
colorbar=True
)
)
WARNING:param.GeoPolygonPlot01858: Due to internal constraints, when aspect and width/height is set, the bokeh backend uses those values as frame_width/frame_height instead. This ensures the aspect is respected, but means that the plot might be slightly larger than anticipated. Set the frame_width/frame_height explicitly to suppress this warning.
The colors on the map represent the error of the model, that is to say the difference between these measured values of asthma prevalence: Error = Predicted Asthma - Observed Asthma The map tends to underestimate the prevalence of asthma on the south side and in some areas on the west side of Chicago, as shown in red. It also shows an overestimation of the values on the north side, the areas in blue. In other words, the spatial patterns of this model do not show important variables that can influence the prevalence of asthma, which varies geographically, and this may be due to a lack of data such as air pollution by MPS, NO2, industrial zones, and socioeconomic indicators. Therefore, this model only uses the NDVI metric, which is a limitation of the model, leading to the model being linear. As already indicated, the model could be improved with more data sources, such as air quality, land use information, redlining maps, etc., in addition to applying standardized predictors.